Fast outlier detection using rough sets theory

نویسندگان

  • F. Shaari
  • A. A. Bakar
  • A. R. Hamdan
چکیده

In many Knowledge Discovery applications, finding outliers is more interesting than finding inliers in a dataset. The perception of outliers is rare cases in dataset in which is being described as abnormal data in the information table. Outliers detections are applied in many important applications like fraud detection systems to uncover the suspicious objects which may have important knowledge hidden in the system. A new outlier detection technique based on Rough Sets Theory (RST) is hereby proposed. RSetOF is a new measure for the outlier factor based on RST. By employing this factor, a new formulation for detecting outlier is established. The outlyingness of outliers objects in a dataset using this measurement is identified. To detect outliers, two measurements which are the top n ratio and the coverage ratio are presented. Finding top n outliers from all objects allow searching of outliers from top ranked records based on the least outlier factor value. The capability in detecting outliers at top n number of outliers will indicate how fast the detection is. The efficiency of this technique by obtaining the coverage ratio value is then tested. The maximum percentage of coverage obtained shows the maximum number of outliers detected belonging to rare cases. A comparison is hence carried out to examine the performance of the RSetAlg with a selective outlier detection method, the Frequent Pattern method referred to as FindFPOF. Ten benchmark datasets for assessing the outlier detection technique are used for this purpose. The experimental result shows that the proposed technique is competitive and proven to be better in speed of detection than the other technique. The fast and efficient detection of outliers has proven its potential as a new outliers detection technique based on RST.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rough K-means Outlier Factor Based on Entropy Computation

Many studies of outlier detection have been developed based on the cluster-based outlier detection approach, since it does not need any prior knowledge of the dataset. However, the previous studies only regard the outlier factor computation with respect to a single point or a small cluster, which reflects its deviates from a common cluster. Furthermore, all objects within outlier cluster are as...

متن کامل

A Novel Approach for Outlier Detection using Rough Entropy

Outlier detection is an important task in data mining and its applications. It is defined as a data point which is very much different from the rest of the data based on some measures. Such a data often contains useful information on abnormal behavior of the system described by patterns. In this paper, a novel method for outlier detection is proposed among inconsistent dataset. This method expl...

متن کامل

Rough sets theory in site selection decision making for water reservoirs

Rough Sets theory is a mathematical approach for analysis of a vague description of objects presented by a well-known mathematician, Pawlak (1982, 1991). This paper explores the use of Rough Sets theory in site location investigation of buried concrete water reservoirs. Making an appropriate decision in site location can always avoid unnecessary expensive costs which is very important in constr...

متن کامل

A New Approach for Knowledge Based Systems Reduction using Rough Sets Theory (RESEARCH NOTE)

Problem of knowledge analysis for decision support system is the most difficult task of information systems. This paper presents a new approach based on notions of mathematical theory of Rough Sets to solve this problem. Using these concepts a systematic approach has been developed to reduce the size of decision database and extract reduced rules set from vague and uncertain data. The method ha...

متن کامل

Some issues about outlier detection in rough set theory

‘‘One person’s noise is another person’s signal” (Knorr, E., Ng, R. (1998). Algorithms for mining distancebased outliers in large datasets. In Proceedings of the 24th VLDB conference, New York (pp. 392–403)). In recent years, much attention has been given to the problem of outlier detection, whose aim is to detect outliers – objects which behave in an unexpected way or have abnormal properties....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008